Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Nature ; 623(7989): 1070-1078, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37968394

RESUMEN

Three billion years of evolution has produced a tremendous diversity of protein molecules1, but the full potential of proteins is likely to be much greater. Accessing this potential has been challenging for both computation and experiments because the space of possible protein molecules is much larger than the space of those likely to have functions. Here we introduce Chroma, a generative model for proteins and protein complexes that can directly sample novel protein structures and sequences, and that can be conditioned to steer the generative process towards desired properties and functions. To enable this, we introduce a diffusion process that respects the conformational statistics of polymer ensembles, an efficient neural architecture for molecular systems that enables long-range reasoning with sub-quadratic scaling, layers for efficiently synthesizing three-dimensional structures of proteins from predicted inter-residue geometries and a general low-temperature sampling algorithm for diffusion models. Chroma achieves protein design as Bayesian inference under external constraints, which can involve symmetries, substructure, shape, semantics and even natural-language prompts. The experimental characterization of 310 proteins shows that sampling from Chroma results in proteins that are highly expressed, fold and have favourable biophysical properties. The crystal structures of two designed proteins exhibit atomistic agreement with Chroma samples (a backbone root-mean-square deviation of around 1.0 Å). With this unified approach to protein design, we hope to accelerate the programming of protein matter to benefit human health, materials science and synthetic biology.


Asunto(s)
Algoritmos , Simulación por Computador , Conformación Proteica , Proteínas , Humanos , Teorema de Bayes , Evolución Molecular Dirigida , Aprendizaje Automático , Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Proteínas/metabolismo , Semántica , Biología Sintética/métodos , Biología Sintética/tendencias
2.
Proc Natl Acad Sci U S A ; 115(44): E10342-E10351, 2018 10 30.
Artículo en Inglés | MEDLINE | ID: mdl-30322927

RESUMEN

Many applications in protein engineering require optimizing multiple protein properties simultaneously, such as binding one target but not others or binding a target while maintaining stability. Such multistate design problems require navigating a high-dimensional space to find proteins with desired characteristics. A model that relates protein sequence to functional attributes can guide design to solutions that would be hard to discover via screening. In this work, we measured thousands of protein-peptide binding affinities with the high-throughput interaction assay amped SORTCERY and used the data to parameterize a model of the alpha-helical peptide-binding landscape for three members of the Bcl-2 family of proteins: Bcl-xL, Mcl-1, and Bfl-1. We applied optimization protocols to explore extremes in this landscape to discover peptides with desired interaction profiles. Computational design generated 36 peptides, all of which bound with high affinity and specificity to just one of Bcl-xL, Mcl-1, or Bfl-1, as intended. We designed additional peptides that bound selectively to two out of three of these proteins. The designed peptides were dissimilar to known Bcl-2-binding peptides, and high-resolution crystal structures confirmed that they engaged their targets as expected. Excellent results on this challenging problem demonstrate the power of a landscape modeling approach, and the designed peptides have potential uses as diagnostic tools or cancer therapeutics.


Asunto(s)
Péptidos/química , Péptidos/metabolismo , Animales , Proteínas Reguladoras de la Apoptosis/metabolismo , Línea Celular , Escherichia coli/metabolismo , Humanos , Ratones , Proteína 1 de la Secuencia de Leucemia de Células Mieloides/metabolismo , Unión Proteica/fisiología , Ingeniería de Proteínas/métodos , Proteínas Proto-Oncogénicas c-bcl-2/metabolismo , Levaduras/metabolismo , Proteína bcl-X/metabolismo
3.
PLoS One ; 11(6): e0157484, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27322383

RESUMEN

Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly available Affymetrix HG-U133Plus2 arrays. After strict quality control and data normalisation the data was quantified in an expression matrix of ∼20,000 genes and ∼28,000 samples. To enable different ways of sample grouping, existing annotations where subjected to systematic ontology assisted categorisation and manual curation. Groups like normal tissues, neoplasmic tissues, cell lines, homoeotic cells and incompletely differentiated cells were created. Unsupervised analysis of the data confirmed global structure of expression consistent with earlier analysis but with more details revealed due to increased resolution. A suitable mixed-effects linear model was used to further investigate gene expression in solid tissue tumours, and to compare these with the respective healthy solid tissues. The analysis identified 1,285 genes with systematic expression change in cancer. The list is significantly enriched with known cancer genes from large, public, peer-reviewed databases, whereas the remaining ones are proposed as new cancer gene candidates. The compiled dataset is publicly available in the ArrayExpress Archive. It contains the most diverse collection of biological samples, making it the largest systematically annotated gene expression dataset of its kind in the public domain.


Asunto(s)
Biomarcadores de Tumor/biosíntesis , Regulación Neoplásica de la Expresión Génica , Proteínas de Neoplasias/biosíntesis , Neoplasias/genética , Biomarcadores de Tumor/genética , Ciclo Celular/genética , Diferenciación Celular/genética , División Celular/genética , Biología Computacional , Replicación del ADN/genética , Bases de Datos Genéticas , Humanos , Proteínas de Neoplasias/genética , Neoplasias/patología , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Componente Principal , Análisis por Matrices de Proteínas
4.
BMC Genomics ; 14 Suppl 3: S9, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23819581

RESUMEN

BACKGROUND: It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions. RESULTS: To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs--from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies. CONCLUSIONS: Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.


Asunto(s)
Adaptación Biológica/genética , Sustitución de Aminoácidos/genética , Estudio de Asociación del Genoma Completo/métodos , Modelos Moleculares , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Proteínas/genética , Regulación Alostérica , Anaerobiosis , Animales , Biología Computacional , Drosophila melanogaster , Modelos Genéticos , Mapas de Interacción de Proteínas/genética , Transducción de Señal/genética , Biología de Sistemas/métodos
5.
Bioinformatics ; 28(10): 1402-3, 2012 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-22474121

RESUMEN

MOTIVATION: Meta-analysis of large gene expression datasets obtained from public repositories requires consistently annotated data. Curation of such experiments, however, is an expert activity which involves repetitive manipulation of text. Existing tools for automated curation are few, which bottleneck the analysis pipeline. RESULTS: We present MageComet, a web application for biologists and annotators that facilitates the re-annotation of gene expression experiments in MAGE-TAB format. It incorporates data mining, automatic annotation, use of ontologies and data validation to improve the consistency and quality of experimental meta-data from the ArrayExpress Repository.


Asunto(s)
Bases de Datos Genéticas , Internet , Anotación de Secuencia Molecular , Minería de Datos , Metaanálisis como Asunto , Transcriptoma
6.
Genetics ; 189(3): 951-66, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21890743

RESUMEN

How genomic diversity within bacterial populations originates and is maintained in the presence of frequent recombination is a central problem in understanding bacterial evolution. Natural populations of Borrelia burgdorferi, the bacterial agent of Lyme disease, consist of diverse genomic groups co-infecting single individual vertebrate hosts and tick vectors. To understand mechanisms of sympatric genome differentiation in B. burgdorferi, we sequenced and compared 23 genomes representing major genomic groups in North America and Europe. Linkage analysis of >13,500 single-nucleotide polymorphisms revealed pervasive horizontal DNA exchanges. Although three times more frequent than point mutation, recombination is localized and weakly affects genome-wide linkage disequilibrium. We show by computer simulations that, while enhancing population fitness, recombination constrains neutral and adaptive divergence among sympatric genomes through periodic selective sweeps. In contrast, simulations of frequency-dependent selection with recombination produced the observed pattern of a large number of sympatric genomic groups associated with major sequence variations at the selected locus. We conclude that negative frequency-dependent selection targeting a small number of surface-antigen loci (ospC in particular) sufficiently explains the maintenance of sympatric genome diversity in B. burgdorferi without adaptive divergence. We suggest that pervasive recombination makes it less likely for local B. burgdorferi genomic groups to achieve host specialization. B. burgdorferi genomic groups in the northeastern United States are thus best viewed as constituting a single bacterial species, whose generalist nature is a key to its rapid spread and human virulence.


Asunto(s)
Borrelia burgdorferi/genética , Variación Genética/genética , Genoma Bacteriano/genética , Enfermedad de Lyme/microbiología , Recombinación Genética/genética , Selección Genética , Simpatría/genética , Adaptación Fisiológica/genética , Animales , Borrelia burgdorferi/fisiología , Secuencia Conservada , Evolución Molecular , Conversión Génica/genética , Especiación Genética , Humanos , Modelos Genéticos , Filogenia , Reproducibilidad de los Resultados , Alineación de Secuencia , Homología de Secuencia de Ácido Nucleico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA